Confusion modelling for automated lip-reading usingweighted finite-state transducers

نویسندگان

Dominic Howell

Barry-John Theobald

Stephen J. Cox

چکیده

Automated lip-reading involves recognising speech from only the visual signal. The accuracy of current state-ofthe-art lip-reading systems is significantly lower than that obtained by acoustic speech recognisers. These poor results are most likely due to the lack of information about speech production that is available in the visual signal: for example, it is impossible to discriminate voiced and unvoiced sounds, or many places of articulation, from visual signals. Our approach to this problem is to regard the visual speech signal as having been produced by a speaker who has a reduced phonemic repertoire and to attempt to compensate for this. In this respect, visual speech is similar to dysarthric speech, which is produced by a speaker who has poor control over their articulators, leading them to speak with a reduced and distorted set of phonemes. In previous work, we found that the use of weighted finite-state transducers improved recognition performance on dysarthric speech considerably. In this paper, we report the results of applying this technique to lip-reading. The technique works, but our initial results are not as good as those obtained by using a conventional approach, and we discuss why this might be so and what the prospects for future investigation are.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

To build a model for implementing automated lip reading which involves Lip motion feature to text conversion

A speech recognition system has three major components: feature extraction, probabilistic modelling of features and classification. In literature, the general approach is to extract the principle components of the lip movement in terms of the lip shape based properties in order to establish a one-to-one correspondence between phonemes of speech and visemes of lip shape. Several modelling and cl...

متن کامل

Improving visual features for lip-reading

Automatic speech recognition systems that utilise the visual modality of speech often are investigated within a speakerdependent or a multi-speaker paradigm. That is, during training the recogniser will have had prior exposure to example speech from each of the possible test speakers. In a previous paper we highlighted the danger of not using different speakers in the training and test sets, an...

متن کامل

Limitations of visual speech recognition

In this paper we investigate the limits of automated lip-reading systems and we consider the improvement that could be gained were additional information from other (non-visible) speech articulators available to the recogniser. Hidden Markov model (HMM) speech recognisers are trained using electromagnetic articulography (EMA) data drawn from the MOCHA-TIMIT data set. Articulatory information is...

متن کامل

Silence models in weighted finite-state transducers

We investigate the effects of different silence modelling strategies in Weighted Finite-State Transducers for Automatic Speech Recognition. We show that the choice of silence models, and the way they are included in the transducer, can have a significant effect on the size of the resulting transducer; we present a means to prevent particularly large silence overheads. Our conclusions include th...

متن کامل

Bimachines and Structurally-Reversed Automata

Although bimachines are not widely used in practice, they represent a central concept in the study of rational functions. Indeed, they are finite state machines specifically designed to implement rational word functions. Their modelling power is equal to that of single-valued finite transducers. From the theoretical point of view, bimachines reflect the decomposition of a rational function into...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

Confusion modelling for automated lip-reading usingweighted finite-state transducers

نویسندگان

چکیده

منابع مشابه

To build a model for implementing automated lip reading which involves Lip motion feature to text conversion

Improving visual features for lip-reading

Limitations of visual speech recognition

Silence models in weighted finite-state transducers

Bimachines and Structurally-Reversed Automata

عنوان ژورنال:

اشتراک گذاری